Antipattern: Leave Out the Constraints

Explore the impacts of leaving the constraints out of the DB table design.

Even though skipping foreign key constraints may seem at first to make our database design simpler, more flexible, and even speedier, we pay for this in other ways: it makes it our responsibility to write code to ensure referential integrity manually.

Assuming flawless code#

Many people’s solution for referential integrity is to write application code so that data relationships are always satisfied. Every time they insert a row, they have to make sure that values in foreign key columns reference existing values in the referenced table. Every time they delete a row, they have to make sure that any child tables are also updated appropriately. In other words, the popular answer is simply to “make no mistakes”.

Furthermore, to avoid referential integrity mistakes when they have no foreign key constraints, they’d have to run extra SELECT queries before they apply the changes, to ensure that the change won’t result in broken references. For instance, to insert a new row, they’d need to make sure that the parent row exists:

Check for the parent row before entering the data

They would then add a bug that references it:

Inserting values in Bugs table

To delete a row, they’d have to make sure no child rows exist:

Checking if any child row exists

Only after running these steps could they then delete the account.

Deleting the account

What if a user with account_id 1 sneaks in and enters a new bug at the moment right after the query is made and before the account is deleted? This may seem unlikely, but as Gordon Letwin, architect of DOS 4, famously said, “One in a million is next Tuesday.” If this happens, it’d leave a broken reference — a bug reported by an account that no longer exists.

The only remedy is for the programmer to explicitly lock the Bugs table while they’re checking it and unlock it after they have finished deleting the account. Any architecture that requires that kind of locking will never do well when high concurrency and scalability are required.

Checking for mistakes#

The anti-solution described in the story in this chapter uses developer-written scripts to report corrupted data.

For example, in our bugs database, the Bugs.status column references the lookup table BugStatus. To find bugs with an invalid status value, we could use a query like the following:

Finding bugs with invalid status value

We can imagine that we’d have to write a similar query for every referential relationship in our database.

If we find ourselves in the habit of checking for broken references like this, our next question is, how often do we need to run these checks? Running hundreds of checks every day, or even more frequently, becomes quite a chore.

What happens when we do find a broken reference? Can we correct it? Sure, we can… sometimes. For instance, we might change an invalid bug status value to a sensible default.

Updating Bugs table

Let’s try to run the query in the playground given below:

Updating value of status in Bugs table

Inevitably, there are other cases where we can’t synthesize data to correct these kinds of mistakes. For example, the Bugs.reported_by column should reference the account of the user who reported the given bug, but if this value is invalid, which user’s account should we use as a replacement?

Handling user changes#

It’s pretty unlikely that all our code touching the database is perfect. We could easily perform similar database updates in several functions in our application. When we have to change the code, how can we be sure that we’ve applied compatible changes to every case in our application?

We may also have users applying changes directly to the database using an SQL query tool or through private scripts. It’s easy to introduce broken references through ad hoc SQL statements. We should assume this will happen at some point in the lifespan of our application.

We need the database to be consistent — that is, we need to be able to depend on the references in our database being satisfied at all times. But we can never be certain that all applications and scripts that have accessed our database have made their changes correctly.

Catch-22 updates#

Many developers avoid foreign key constraints because the constraints make it inconvenient to update related columns in multiple tables. For instance, if we need to delete a row that other rows depend on, we have to delete the child rows first to avoid violating foreign key constraints:

Deleting a dependent row

The following code would give an error. Try it yourself!

Deleting a dependent row

You can also test different commands in the widget given at the end of the lesson.

Now let’s delete the record in the parent row; the child row will now be deleted successfully.

Deleting a dependent row

We have to execute multiple statements manually, one for each child table. If we add another child table to our database in the future, we have to fix our code to delete it from the new table too. But this problem is solvable.

The unsolvable problem is when we UPDATE a column that child rows depend on. We can’t update the child rows before we update the parent, and we can’t update the parent before we update the child values that reference it. We need to make both changes simultaneously, but that’s impossible using two separate updates. It’s a “catch-22” scenario.

Catch-22 scenario

Let’s try it ourselves in the following playground. We’ll run the code UPDATE Bugs SET status = 'INVALID' WHERE status = 'BOGUS'; in the following playground in order to check its working.

Catch-22 scenario

Some developers find these scenarios difficult to manage, so they decide not to use foreign keys at all. We’ll see later how foreign keys address multi-table updates and deletions in a simple and effective way.

Try it yourself#

The data for Accounts, Products, and BugStatus is already available in the database. You can do the following in the given playground:

  • Retrieve data from any of the database tables, i.e., Accounts, Products, and BugStatus.
  • Insert more data in database tables or insert some data in the Bugs table because it is empty.
  • Retrieve data after insertion.
  • Delete the data from the database or tables.
Try it yourself editor
Synopsis: Keyless Entry
Solution: Declare Constraints
Mark as Completed
Report an Issue